This is the R Notebook for the Book 100 Statistical Tests in R from (Lewis, Lewis, and Lewis 2013).
In order to better illustrate the statistical tests through real-world applications, this notebook will use the built-in R datasets to cover all test types. Run the following codes to ensure all packages and datasets are prepared.
library(psych)
# Load core datasets
data(iris)
data(mtcars)
data(airquality)
data(sleep)
data(UCBAdmissions)
data(faithful)
data(AirPassengers)
# For survival analysis
library(lattice)
library(survival)
data(lung)
# For circular statistics
library(circular)
data(wind)
# For visualisation
library(ggplot2)
library(tidyverse)
To assess the null hypothesis of zero correlation between two variables. When to use this test?
Interval or ratio scale
No need for same scale
A paired sample with approximately normally distributed
Joint distribution is bivariate normal
Linear relationships
In this test, we will use iris dataset to compare and illustrate the relationship between lengths of sepals and petals.
str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
summary(iris$Sepal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
summary(iris$Petal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.600 4.350 3.758 5.100 6.900
cor.test(iris$Sepal.Length,iris$Petal.Length, method = "pearson", alterantive = "two.sided", conf.level = 0.95)
Pearson's product-moment correlation
data: iris$Sepal.Length and iris$Petal.Length
t = 21.646, df = 148, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8270363 0.9055080
sample estimates:
cor
0.8717538
The correlation between x (length of sepals) and y (length of petal) is 0.872. The p-value is smaller than 0.05, we reject the null hypothesis of zero correlation. 95% confidence interval is reported betwewn [0.827,0.906], which does not cross zero.
plot(iris$Sepal.Length,iris$Petal.Length)
cor_test_iris_Sepal_Petal_length <- ggplot(data = iris, aes(x=Sepal.Length, y=Petal.Length)) +
geom_point()
cor_test_iris_Sepal_Petal_length
When to use this test?
To assess the null hypothesis of zero correlation between two variables.
Ordinal or ranked data
Data is continuous and unreasonable to assume the variables are normally distributed
Linear relationship
In this test, we will continue to use iris dataset to compare and illustrate the relationship between lengths of sepals and petals.
cor.test(iris$Sepal.Length,iris$Petal.Length, method = "spearman", alternative = "two.sided")
Spearman's rank correlation rho
data: iris$Sepal.Length and iris$Petal.Length
S = 66429, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8818981
The correlation between x (length of sepals) and y (length of petal) is 0.882. The p-value is smaller than 0.05, we reject the null hypothesis of zero correlation. In the spearman test, there is no confidence interval normally. Spearman correlation uses rank-based calculations rather than the raw data values, making the sampling distribution more complex for computing confidence intervals using standard parametrix methods.
When to use this test?
a paired random sample of ordinal or ranked data
data is continuous and it is unreasonable to the variables are normally distributed
linear relationship
We will use a new dataset mtcars to illustrate the Tau correlation coefficient test:
| [, 1] | mpg | Miles/(US) gallon |
| [, 2] | cyl | Number of cylinders |
| [, 3] | disp | Displacement (cu.in.) |
| [, 4] | hp | Gross horsepower |
| [, 5] | drat | Rear axle ratio |
| [, 6] | wt | Weight (1000 lbs) |
| [, 7] | qsec | 1/4 mile time |
| [, 8] | vs | Engine (0 = V-shaped, 1 = straight) |
| [, 9] | am | Transmission (0 = automatic, 1 = manual) |
| [,10] | gear | Number of forward gears |
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars)
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
cor.test(mtcars$hp,mtcars$wt, method="kendal", alternative = "two.sided")
Kendall's rank correlation tau
data: mtcars$hp and mtcars$wt
z = 4.845, p-value = 1.266e-06
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.6113081
The correlation between x (hp) and y (wt) is 0.611. The p-value is significant smaller than the critical value 0.05. Therefore, we reject the null hypothesis.
When to use this test?
correlation coefficients calculated from independent samples
data are assumed to be bivariate normal
sample size may be different
Z-test is less common than Pearson or Spearman correlation. It actually examines whether the correlation coefficients from two or more samples with different sample size is correlated or not.
In this test, we will use iris dataset as the example. We will first calculate the correlation coefficients of Sepal length and Petal length from different species. After that, we will examine the Z-test between these correlation coefficients. We also need ‘psych’ package to use paried.r function. This function require six parameters: correlation coefficients xy, correlation coeefficients xz, correlaiton yz = NULL, number in first group, number in second group, twotailed
cor_setosa <- cor(iris$Sepal.Length[iris$Species == "setosa"], iris$Petal.Length[iris$Species == "setosa"], method = "pearson")
cor_versicolor <- cor(iris$Sepal.Width[iris$Species == "versicolor"], iris$Petal.Width[iris$Species == "versicolor"], method = "pearson")
n_setosa <- sum(iris$Species == "setosa")
n_versicolor <- sum(iris$Species == "versicolor")
Now we will use paired.r() to calculate the Z-test.
z_test_cor_setosa_versicolor <- paired.r(cor_setosa, cor_versicolor, NULL, n_setosa, n_versicolor, twotailed=TRUE)
z_test_cor_setosa_versicolor$test
[1] "test of difference between two independent correlations"
z_test_cor_setosa_versicolor$z
[1] 2.550422
z_test_cor_setosa_versicolor$p
[1] 0.01075925
The p-value is less than the critical value 0.05. Therefore we can reject the null hypothesis.
When to use this test?
Compare the null hypothesis of zero correlation between one pair of variables and another pair of variables shared with an overlapping variable
aim to select the better indicator of two available predictors (x and y) for a dependent variable z
data are assumed to be normally distributed
referred to as Steiger’s t-test, Meng’s t-test, Rosenthal & Rubin’s t-test or Williams test
We will use mtcars to evaluate the relationships between the correlation of wt (weights) and mpg (miles per gallon) and the correlation of hp (horsepower) and mpg and figure out whether wt or hp is a better indicator of mpg.
We will need compOverlapCorr and psych packages for this test.
library(compOverlapCorr)
library(psych)
cor_mpg_wt <- cor(mtcars$mpg,mtcars$wt, method = "pearson")
cor_mpg_hp <- cor(mtcars$mpg, mtcars$hp, method = "pearson")
cor_hp_wt <- cor(mtcars$hp, mtcars$wt, method = "pearson")
compOverlapCorr(length(mtcars$mpg), cor_mpg_wt, cor_mpg_hp, cor_hp_wt)
[1] -1.3303850 0.1833915
The first number is the result of t-test and the second number is p-value. The p-value is greater than the critical value 0.05. Therefore, we can not reject the null hypothesis.
We can also use psych package to calculate this correlation.
paired.r(cor_mpg_wt, cor_mpg_hp, cor_hp_wt, length(mtcars$mpg), twotailed = TRUE)
Call: paired.r(xy = cor_mpg_wt, xz = cor_mpg_hp, yz = cor_hp_wt, n = length(mtcars$mpg),
twotailed = TRUE)
[1] "test of difference between two correlated correlations"
t = -1.35 With probability = 0.19
We can see the same results from both two packages. As the package {compOverlapCorr} is no longer maintained, we should use {psych} to measure this test.
Question to the test addresses:
Is the correlation between two variables (A and B) significantly different from the correlation between two other variables (C and D), when all four variables were measured in the same group.
When to use this test?
To assess the difference between one pair of variables and a second non-overlapping pair of variables.
Non-overlapping could be related items in different times
Data are assumed to be normally distributed
Usually, this method aims to discuss the difference in correlation between two variables in two different times for serial analysis. Here, we will discuss the basic mode of this test, two independent sets of variables in the dataset mtcars.
For the cars in this dataset, is the relationship between engine power (hp and disp) significantly different with the relationship between performance (mpg and wt)
library("psych")
cor_hp_disp <- cor(mtcars$hp, mtcars$disp, method = "pearson")
cor_mpg_wt <- cor(mtcars$mpg, mtcars$wt, method ="pearson")
#we also need the cross relationships between these variables
cor_hp_wt <- cor(mtcars$hp, mtcars$wt, method = "pearson")
cor_mpg_disp <- cor(mtcars$mpg, mtcars$disp, method = "pearson")
cor_mpg_hp <- cor(mtcars$mpg, mtcars$hp, method = "pearson")
cor_disp_wt <- cor(mtcars$disp, mtcars$wt, method = "pearson")
r.test(length(mtcars$mpg), r12 = cor_hp_wt,r34 = cor_mpg_wt,r23 = cor_mpg_disp, r13 = cor_hp_wt, r14 = cor_hp_wt, r24 = cor_mpg_disp, twotailed = TRUE)
Correlation tests
Call:r.test(n = length(mtcars$mpg), r12 = cor_hp_wt, r34 = cor_mpg_wt,
r23 = cor_mpg_disp, r13 = cor_hp_wt, r14 = cor_hp_wt, r24 = cor_mpg_disp,
twotailed = TRUE)
Test of difference between two dependent correlations
z value 5.52 with probability 0
We can see the p-value is 0, smaller than the critical value 0.05. Therefore, we can reject the null hypothesis. The correlation between engine powers and correlation between power performance are different.
Question to the test addresses:
Are these variables essentially independent of each other?
When to use this test?
check the correlation matrix is different from identity matrix
identity matrix means all variables in this matrix are uncorrelated
to check whether data is suitable for factor analysis (FA)
For researcher, if we want to conduct factor analysis for our variables, it is important that we should reject the null hypothesis of Bartlett’s test that there is no difference between our variable matrix and identity matrix. Only when variables are correlated with each other, we can continue to do factor analysis.
For this test, we will use iris dataset to test the variable matrix
among Sepal.Length, Sepal.Width,
Petal.Length, Petal.Width.
library("psych")
matrix_data <- iris[, 1:4]
cortest.bartlett(matrix_data)
$chisq
[1] 706.9592
$p.value
[1] 1.92268e-149
$df
[1] 6
According to the result, we can find the p-value is smaller than the
critical value 0.05. Therefore, we should reject the null hypothesis
that our matrix is the identity matrix. We can argue that there are
internal correlations between our matrix among
Sepal.Length, Sepal,Width,
Petal.Length, and Petal.Width.
Question to the test addresses:
Is the fundamental pattern of correlations at A the same as the pattern at B?
To use this test:
Two independent samples or subsamples
The same set of variables measured for both groups
The data are assumed to be normally distributed
In this test, we will use the EuStockMarkets dataset,
which is a built-in dataset in R, measuring the daily closing prices for
4 major EU stock indices from 1991-1998.
We will examine the correlation pattern between the DAX, SMI, CAC, and FTSE indices different in the first half of the time period and the second half of the time period.
library("psych")
sample1 <- EuStockMarkets[1:900, ]
sample2 <- EuStockMarkets[961:1860, ]
cortest.jennrich(sample1,sample2)
$chi2
[1] -719.9148
$prob
[1] 1
According to the result, we can find the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that the correlation patterns of these two samples are statistically the same. The overall pattern of correlation between the major EU stock indices did not significantly change between the two time periods.
Question to the test addresses:
Is one time series useful in forecasting another time series?
To use this test:
at least two time series data
stationary data: all properties of data are constant over time such as mean, variance and autocorrelation
data are assumed to be ordered chronologically
In this test, we will continue to use the time series data
EuStockMarters with the diff() function to
create a suitable dataset for this analysis. We will test these
relationships:
Dose DAX Granger-cause CAC?
Does DAX Granger-cause FTSE?
Does SMI Granger-cause CAC?
Does SMI Granger-cause FTSE?
data("EuStockMarkets")
dax_diff <- diff(EuStockMarkets[,"DAX"])
smi_diff <- diff(EuStockMarkets[,"SMI"])
cac_diff <- diff(EuStockMarkets[,"CAC"])
ftse_diff <- diff(EuStockMarkets[,"FTSE"])
library(lmtest)
grangertest(cac_diff ~ dax_diff, order = 3)
Granger causality test
Model 1: cac_diff ~ Lags(cac_diff, 1:3) + Lags(dax_diff, 1:3)
Model 2: cac_diff ~ Lags(cac_diff, 1:3)
Res.Df Df F Pr(>F)
1 1849
2 1852 -3 0.8165 0.4847
According to the result, we can find that the p-value is larger than the critical value 0.05.Therefore, we can not reject the null hypothesis that DAX does not Granger-cause CAC. Similarly, we can continue to finish the rest of tests
grangertest(ftse_diff ~ dax_diff, order = 3)
Granger causality test
Model 1: ftse_diff ~ Lags(ftse_diff, 1:3) + Lags(dax_diff, 1:3)
Model 2: ftse_diff ~ Lags(ftse_diff, 1:3)
Res.Df Df F Pr(>F)
1 1849
2 1852 -3 1.3093 0.2697
grangertest(cac_diff ~ smi_diff, order = 3)
Granger causality test
Model 1: cac_diff ~ Lags(cac_diff, 1:3) + Lags(smi_diff, 1:3)
Model 2: cac_diff ~ Lags(cac_diff, 1:3)
Res.Df Df F Pr(>F)
1 1849
2 1852 -3 3.9466 0.008081 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
grangertest(ftse_diff ~ smi_diff, order = 3)
Granger causality test
Model 1: ftse_diff ~ Lags(ftse_diff, 1:3) + Lags(smi_diff, 1:3)
Model 2: ftse_diff ~ Lags(ftse_diff, 1:3)
Res.Df Df F Pr(>F)
1 1849
2 1852 -3 5.5971 0.0008069 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
According to the test, we can see that the latter two p-values are significant smaller than the critical value 0.05. Therefore, we can reject the null hypothesises. SMI Granger-causes the FTSE and CAC.
We can also visualise these results to “see” the relationships. We will use ggplot 2 and zoo packages.
library(ggplot2)
library(zoo)
plot_data <- data.frame(
Date = 1:length(dax_diff),
SMI = as.numeric(smi_diff),
CAC = as.numeric(cac_diff),
FTSE = as.numeric(ftse_diff)
)
ggplot(plot_data, aes(x=Date)) +
geom_line(aes(y=SMI, color = "SMI")) +
geom_line(aes(y=CAC, color="CAC")) +
labs(title = "Daily Changes in SMI and CAC", y = "Changes in Price", x="Changes in Day") +
theme_minimal()
ggplot(plot_data, aes(x=Date)) +
geom_line(aes(y=SMI, color = "SMI")) +
geom_line(aes(y=FTSE, color="FTSE")) +
labs(title = "Daily Changes in SMI and FTSE", y = "Changes in Price", x="Changes in Day") +
theme_minimal()
Question to the test addresses:
Is there serial correlation in the sample?
To use this test:
whether residuals (error terms) from a regression model are independent
residuals are stationary and normally distribution with zero mean
If residuals are not independent, the model will have incorrect standard errors, misleading p-values and false positives. The error from one observation provides information about the error in the next observation, which violates the assumption of linear regression.
For this test, we will use cars to build a simple linear
regression model, and test the relationship between
dist ~ speed.
data("cars")
car_model <- lm(dist ~ speed, data = cars)
library("car")
durbinWatsonTest(car_model)
lag Autocorrelation D-W Statistic p-value
1 0.1604322 1.676225 0.194
Alternative hypothesis: rho != 0
According to the result, we can see that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that there is no autocorrelations in this model.
To visualise the residuals, we can plot each residual against the one that came before it (the lagged value). If there is no autocorrelation, we should see the random dot cloud with no observable pattern.
res <- residuals(car_model)
residual_data <- data_frame(
Residuals = res[-1],
Lagged_Residuals = res[-length(res)]
)
ggplot(residual_data, aes(x = Lagged_Residuals, y= Residuals)) +
geom_point() +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
labs(title = "Plot of residuals and lagged residuals", x = "lagged residuals", y = "residuals")
Question to the test addresses:
Is there serial correlation in the sample?
To use this test:
included lagged values in the model
residuals are assumed to be stationary and normally distribution with zero mean
powerful and general than Durbin-Watson Autocorrelation test
This test is used when there is lagged values of dependent variables
in the model, which can not use the Durbin-Watson test. For example, the
growth of population in one city or the stock prices in the market. We
will use EuStockMarkets to examine this test. We will
examine the daily changes of DAX.
library(lmtest)
dax_df <- data.frame(DAX_change = as.numeric(dax_diff))
dax_df <- dax_df %>%
mutate(DAX_change_lag1 = lag(DAX_change, 1))
dynamic_model <- lm(DAX_change ~ DAX_change_lag1, data = dax_df)
bgtest(dynamic_model, order = 3)
Breusch-Godfrey test for serial correlation of order up to 3
data: dynamic_model
LM test = 0.65409, df = 3, p-value = 0.8839
According to this result, the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that there is no lagged value in the dependent variables.
Question to the test addresses:
Is the mean of a sample significantly different from a hypothesised mean?
To use this test:
data is assumed to be normally distributed
the population standard deviation is unknown
For this test, we will use the built-in dataset
sleep.
drug1_data <- sleep %>%
filter(group == 1)
t.test(drug1_data$extra, mu = 0, alternative = "two.sided")
One Sample t-test
data: drug1_data$extra
t = 1.3257, df = 9, p-value = 0.2176
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.5297804 2.0297804
sample estimates:
mean of x
0.75
According to the result, we can find that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that the drug has no effect on sleep average time.
Question to the test addresses:
Is the median of a sample significantly different from a hypothesised value?
To use this test:
sample data are assumed not to be normally distributed
a nonparametric test
alternative method to the one-sample t-test
This test is closely related to the test 12. In order to practice this test, we first examine the distribution of sample dataset that we created.
shapiro.test(drug1_data$extra)
According to the result, the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that the sample data is normally distributed. Even though one-sample t-test is better, we can still use the Wilcoxon test here.
wilcox.test(drug1_data$extra, mu = 0, alternative = "two.sided")
Wilcoxon signed rank test with continuity correction
data: drug1_data$extra
V = 31, p-value = 0.3433
alternative hypothesis: true location is not equal to 0
According to the result, we can find that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null-hypothesis. We get the same conclusions from t-test and Wilcoxon test.
Question to the test addresses:
Is the median of a sample significantly different from a hypothesised median?
To use this test:
nonparametric test
even simpler requirements than Wilcoxon test
best for description data like better, same, worse
less powerful than Wilcoxon test and t-test
We will continue to use the drug data as sample to demonstrate the
coding processes. We will need package BSDA to calculate
this test.
library(BSDA)
SIGN.test(drug1_data$extra, mu=0, alternative = "two.sided")
One-sample Sign-Test
data: drug1_data$extra
s = 5, p-value = 1
alternative hypothesis: true median is not equal to 0
95 percent confidence interval:
-0.8755556 2.9457778
sample estimates:
median of x
0.35
Achieved and Interpolated Confidence Intervals:
Conf.Level L.E.pt U.E.pt
Lower Achieved CI 0.8906 -0.2000 2.0000
Interpolated CI 0.9500 -0.8756 2.9458
Upper Achieved CI 0.9785 -1.2000 3.4000
According to the results, we can find that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis. Compared to t-test and Wilcoxon test, we can find that the simpler conditions a test requires, the larger p-value we can get.
Question to the test addresses:
Is the difference between the mean of two samples significantly different from zero?
To use this test:
compare means of two independent groups
each group is assumed to be normally distribution
variances of two groups are equal
In this test, we will use the chickwts dataset which
contains the weights of 71 six-week-old chicks fed by different types of
feed. Let’s test if there is a significant difference in the final
weight of chicks fed by “casein” and
“soybean”
data(chickwts)
head(chickwts)
chicks_compare <- chickwts %>%
filter(feed %in% c("casein","soybean"))
t.test(weight ~ feed, data = chicks_compare, var.equal = TRUE)
Two Sample t-test
data: weight by feed
t = 3.3199, df = 24, p-value = 0.002869
alternative hypothesis: true difference in means between group casein and group soybean is not equal to 0
95 percent confidence interval:
29.18928 125.12024
sample estimates:
mean in group casein mean in group soybean
323.5833 246.4286
According to this result, we can find that the p-value is smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that there is no difference between the weights from soybean group of chicks and casein group of chicks.
Question to the test addresses:
Is the difference between the mean of three or more samples significantly different from zero?
To use this test:
more than two groups of data
all data are assumed to be normally distributed
We will continue use the chickwts dataset. Now we will
compare all different types of feed at the same time. Therefore, we need
to use the adjust p-value for more accurate results. We will adjust the
pool.sd to FALSE for the comparison for test 17. However,
we need to remember we do not need to set pool.sd to FALSE
in actual uses.
pairwise.t.test(chickwts$weight, chickwts$feed, p.adjust.method = "holm", pool.sd = FALSE)
Pairwise comparisons using t tests with non-pooled SD
data: chickwts$weight and chickwts$feed
casein horsebean linseed meatmeal soybean
horsebean 1.0e-05 - - - -
linseed 0.00261 0.04808 - - -
meatmeal 0.39465 0.00126 0.17599 - -
soybean 0.02817 0.00171 0.59396 0.59396 -
sunflower 0.82151 2.5e-07 0.00031 0.22207 0.00386
P value adjustment method: holm
According to the results, we can find that some p-values are varied in the table. For example, we can reject the null hypotheses of casein and horsebeam, and casein and linseed but we can not reject the null hypotheses of casein and meatmeal, and casein and sunflower.
The reason why the p-value of the pair “casein and soybean” is a bit larger than the example in test 15 is that p-value adjustment for multiple comparisons. The “holm” method slightly increases the p-values to reduce the chance of making Type I error.
Question to the test addresses:
Is the difference between the mean of three or more samples significantly different from zero?
To use this test:
variances are equal across all test groups
powerful than test 16 if the assumption is true
The only change for this test is that we set the pool.sd
to TRUE.
pairwise.t.test(chickwts$weight, chickwts$feed, p.adjust.method = "holm", pool.sd = TRUE)
Pairwise comparisons using t tests with pooled SD
data: chickwts$weight and chickwts$feed
casein horsebean linseed meatmeal soybean
horsebean 2.9e-08 - - - -
linseed 0.00016 0.09435 - - -
meatmeal 0.18227 9.0e-05 0.09435 - -
soybean 0.00532 0.00298 0.51766 0.51766 -
sunflower 0.81249 1.2e-08 8.1e-05 0.13218 0.00298
P value adjustment method: holm
According to the results, we almost get the same answers from test 15
and test 16. However, some changes are observable. For example, the
p-value between casein and soybean is slightly larger than the p-value
in test 15 but smaller than the p-value in test 16 when
pool.sd is set to FALSE.
The reason why three p-values of the pair “casein and soybean are different is that different kinds of variances are used in the calculation. For test 15, we only calculate the deviation between”casein” and “soybean”. In test 17, we actually test the deviations across six groups together, which contains more data. For the test 16, we calculate the test through a separate comparison for each pair without assuming equal variances. This is actually the Welch t-test, which will be discussed in test 18.
Question to the test addresses:
Is the difference between the mean of two sample groups significantly different from zero?
To use this test:
The code for this test is as same as what we did in 16 but only with two groups of feeds.
feeds_to_compare <- chickwts %>%
filter(feed %in% c("casein", "soybean"))
t.test(weight ~ feed, data = feeds_to_compare, var.equal = FALSE)
Welch Two Sample t-test
data: weight by feed
t = 3.2743, df = 21.635, p-value = 0.003521
alternative hypothesis: true difference in means between group casein and group soybean is not equal to 0
95 percent confidence interval:
28.23823 126.07129
sample estimates:
mean in group casein mean in group soybean
323.5833 246.4286
According to the results, we can find that p-value is smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that there is no difference in the final weights of chicks between two types of feeds.
Question to the test addresses:
Is the difference between the mean of two related samples significantly different from zero?
To use this test:
measure same subject twice, before and after
subjects are assumed to be drawn from a population with a normal distribution
We will use the sleep dataset to examine the drug
effects on sleep duration.
drug1_sleep <- sleep$extra[sleep$group == 1]
drug2_sleep <- sleep$extra[sleep$group == 2]
t.test(drug1_sleep, drug2_sleep, paried = TRUE, var.equal = TRUE)
Two Sample t-test
data: drug1_sleep and drug2_sleep
t = -1.8608, df = 18, p-value = 0.07919
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.363874 0.203874
sample estimates:
mean of x mean of y
0.75 2.33
According to the results, we find that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that there is no difference on the mean effect between drug 1 and drug 2 on the same person.
Question to the test addresses:
Is the difference between the mean of two related samples significantly different from zero?
To use this test:
data are paired
can not assume the differences are normally distributed
on ranks of differences
We will still use the sleep dataset in this test.
wilcox.test(drug1_sleep, drug2_sleep, paired = TRUE)
Wilcoxon signed rank test with continuity correction
data: drug1_sleep and drug2_sleep
V = 0, p-value = 0.009091
alternative hypothesis: true location shift is not equal to 0
According to the result, we can find that the p-value is smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that there is no difference on the drug effects between drug 1 and drug 2 on the same person.
Compared to the result in Test 19, we get the opposite conclusions.
Test 19 indicates that we can not reject the null hypothesis but Test 20
indicates that we can do it. The statistical reason is that the Wilcoxon
test is often more powerful than the t-test when the
data is not perfectly normally distributed or when there are outliers.
The Wilcoxon test works with the ranks of the data, which makes it less
sensitive to one or two unusually large or small values. In this
sleep dataset, the differences might not be perfectly
normal, giving the Wilcoxon test an advantage in detecting a real
effect.
Question to the test addresses:
Is the difference between the mean of two samples significantly different from zero, when tested across multiple pairs?
To use this test:
multiple samples
matched pairs experimental design
subjects are assumed to be normally distributed
We will use the built-in dataset iris to examine the
difference in the length of sepal. The iris dataset
contains measurements for 50 flowers from 3 different species. Let’s
pretend for this exercise that these aren’t three separate species, but
are instead measurements of the same 50 flowers at
three different points in time (Time 1, Time 2, Time 3). This lets us
practice a paired analysis.
pairwise.t.test(iris$Sepal.Length, iris$Species, paired = TRUE, p.adjust.method = "holm")
Pairwise comparisons using paired t tests
data: iris$Sepal.Length and iris$Species
setosa versicolor
versicolor 2.5e-13 -
virginica < 2e-16 3.0e-06
P value adjustment method: holm
According to the result, we can find that the p-values across all pairs are smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that there is no significant difference in the length of sepal between any of the three time points.
Question to the test addresses:
Is the difference between the mean of two samples significantly different from zero, when tested across multiple pairs?
To use this test:
same with conditions in test 21
can not assume to be normally distributed
We keep our assumption that we set in test 21.
pairwise.wilcox.test(iris$Sepal.Length, iris$Species, paired = TRUE, p.adjust.method = "holm")
Pairwise comparisons using Wilcoxon signed rank test with continuity correction
data: iris$Sepal.Length and iris$Species
setosa versicolor
versicolor 7.2e-09 -
virginica 3.4e-09 1.9e-05
P value adjustment method: holm
According to this result, we can get the same conclusion based on the p-value.
Question to the test addresses:
Is the difference between the median of two related samples significantly from zero?
To use this test:
each subject is measured twice, before and after
a matched pairs experimental design
continuous distribution
We will use the sleep dataset. We will also need the
function SIGN.test() from BSDA package.
drug1_sleep <- sleep$extra[sleep$group == 1]
drug2_sleep <- sleep$extra[sleep$group == 2]
SIGN.test(drug1_sleep, drug2_sleep)
Dependent-samples Sign-Test
data: drug1_sleep and drug2_sleep
S = 0, p-value = 0.003906
alternative hypothesis: true median difference is not equal to 0
95 percent confidence interval:
-2.2053333 -0.8648889
sample estimates:
median of x-y
-1.3
Achieved and Interpolated Confidence Intervals:
Conf.Level L.E.pt U.E.pt
Lower Achieved CI 0.8906 -1.8000 -1.0000
Interpolated CI 0.9500 -2.2053 -0.8649
Upper Achieved CI 0.9785 -2.4000 -0.8000
According to the result, we can find the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis.
Question to the test addresses:
Is the difference between the median of two samples significantly different from zero?
To use this test:
Data are not assumed to be normally distributed
Two independent groups
works with rank of the data
We will use the chickwts one last time to compare it
with the t-tests.
feeds_to_compare <- chickwts %>%
filter(feed %in% c("casein", "soybean"))
wilcox.test(weight ~ feed, data = feeds_to_compare)
Wilcoxon rank sum test with continuity correction
data: weight by feed
W = 138, p-value = 0.005919
alternative hypothesis: true location shift is not equal to 0
According to the result, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis.
This test is used to check for randomness in a sequence of binary events. Binary data means that data with only two categories, like Yes/No, or 0/1.
This test detects the runs in a sequence of data. A
run means the uninterrupted sequence of the same event.
For example, in the sequence 1, 1, 1, 0, 0, 1, 0, 0, 0,
there are 4 runs (111, 00, 1,
000). If there are too few or too many runs, it suggests
the sequence isn’t random.
We will use the runs.test() from tseries
package.
if (!require("tseries")) {
install.packages("tseries")
library(tseries)
}
binary_sequence <- factor(c(1,1,0,0,1,0,1,1,1,0,0,0))
runs.test(binary_sequence)
Runs Test
data: binary_sequence
Standard Normal = -0.60553, p-value = 0.5448
alternative hypothesis: two.sided
According to this result, we can find that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that our binary sequence is random.
This test extends the test 25 into continuous data. Question to this test addresses:
Is the sequence of observations in a sample radomly distributed?
It works for continuous data.
We will use the runs.test() from the
lawstat package for this.
if (!require("lawstat")) {
install.packages("lawstat")
library(lawstat)
}
continuous_sequence <- c(1.8, 2.3, 1.5, 2.9, 1.1, 3.5, 0.9)
runs.test(continuous_sequence)
Runs Test - Two sided
data: continuous_sequence
Standardized Runs Statistic = 2.7386, p-value = 0.00617
According to this result, we can find that the p-value is smaller than critical value 0.05. Therefore, we can reject the null hypothesis that our continuous sequence is random.
Question to the test addresses:
Is the sequence of observations in a sample randomly distributed?
Bartels test is a powerful alterantive to the Wald-Wolfowitz runs test for continuous data. It’s based on the ranks of data rather than just their position relative to the median.
The function is bartels.test() from the same
lawstat package. We will use the same continuous in the
test 26 in this test.
bartels.test(continuous_sequence)
Bartels Test - Two sided
data: continuous_sequence
Standardized Bartels Statistic = 1.6536, RVN Ratio = 3.25, p-value =
0.09821
According to the result, we can find that p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that our continuous sequence is random.
The Wald-Wolfowitz test gave a p-value of 0.006 (significant), while the Bartels test gave 0.098 (not significant). While the Barlets test is often more powerful, it’s not a universal rule. Different tests are sensitive to different kinds of non-random patterns. In our example, the very short sequence, they way the two tests calculate their statistics led them to different conclusions.
Both the Ljung-Box and Box-Pierce tests check for autocorrelation in a time series.
Question to these tests address:
Is the sequence of observations in a sample randomly distributed?
These tests check if the autocorrelations between the series and its past values are all zero. The Ljung-Box test is a refinement of the Box-Pierce test and is generally preferred for small sample sizes.
We will use the built-in Nile dataset contains
measurements of the flow of the river Nile from 1871-1970.
plot(Nile)
Box.test(Nile, lag = 10, type = "Ljung-Box")
Box-Ljung test
data: Nile
X-squared = 88.127, df = 10, p-value = 1.255e-14
Box.test(Nile, lag =10, type = "Box-Pierce")
Box-Pierce test
data: Nile
X-squared = 83.229, df = 10, p-value = 1.166e-13
According to these results, we can find that the p-values are both smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that the measurements of the river Nile is random.
Question to the test addresses:
Is a time series independent and identically distributed?
To use this test:
detect complex and non-linear patterns
single time series
We will continue to use the Nile dataset and check if
there is any non-random or non-linear patterns.
bds.test(Nile)
BDS Test
data: Nile
Embedding dimension = 2 3
Epsilon for close points = 84.6138 169.2275 253.8413 338.4550
Standard Normal =
[ 84.6138 ] [ 169.2275 ] [ 253.8413 ] [ 338.455 ]
[ 2 ] 2.9223 6.9347 6.7724 5.6218
[ 3 ] 1.9868 7.4144 7.6433 6.5196
p-value =
[ 84.6138 ] [ 169.2275 ] [ 253.8413 ] [ 338.455 ]
[ 2 ] 0.0035 0 0 0
[ 3 ] 0.0469 0 0 0
According to the result, we can find that all p-values are smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that the Nile dataset is independent and identically distributed. It suggests the patterns in the Nile dataset are more complex and maybe non-linear.
Question to the test addresses:
Do two independent samples come from populations having the same distribution?
To use this test:
combining two samples together
ranking with the run test
We will use the Nile dataset to see whether the first 50 years are different from the last 50 years.
nile_first_50 <- Nile[1:50]
nile_last_50 <- Nile[51:100]
plot(nile_first_50)
plot(nile_last_50)
combined_nile <- c(nile_first_50,nile_last_50)
group_labels <- factor(c(rep("first", 50), rep("last", 50)))
sorted_groups <- group_labels[order(combined_nile)]
tseries::runs.test(sorted_groups)
According to the result, we can find that the p-value is smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that there is no difference between the first 50 years and last 50 years. The test suggests that the two samples are not randomly mixed, which means the distribution of the Nile’s flow in the first 50 years is likely different from its distribution in the last 50 years.
Question to the test addresses:
Do two independent samples come from the same distribution?
To use this test:
We will compare the spread of weights for chicks fed “casein” versus those fed “sunflower”. Visually, their average weights look similar, but is the consistency of the weight gain the same?
we can use the mood.test() function.
casein_v_sunflower <- chickwts %>%
filter(feed %in% c("casein", "sunflower"))
ggplot(casein_v_sunflower, aes(x = feed, y = weight, fill = feed)) +
geom_boxplot() +
labs(title = "Comparison of Chick Weights by Feed Type",
x = "Feed Type",
y = "Weight (grams)") +
theme_minimal()
mood.test(weight ~ feed, data = casein_v_sunflower)
Mood two-sample test of scale
data: weight by feed
Z = 1.3501, p-value = 0.177
alternative hypothesis: two.sided
According to the result, we can find that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that these two independent samples come from the same distribution.
Question to the test addresses:
Are the variances of two samples equal?
To use this test:
highly sensitive to the data being normally distributed
parametric test, very powerful
We will use the F-test on the same two groups: “casein”
vs. “sunflower”. This will let us compare the result from a parametric
test (F-test) to the nonparametric one (Mood’s test). The function for
F-test is var.test().
casein_v_sunflower <- chickwts %>%
filter(feed %in% c("casein", "sunflower"))
var.test(weight ~ feed, data = casein_v_sunflower)
F test to compare two variances
data: weight by feed
F = 1.7408, num df = 11, denom df = 11, p-value = 0.3718
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.5011282 6.0469059
sample estimates:
ratio of variances
1.740769
According to the result, we can find that the p-value is larger than the critical value. Therefore, we can not reject the null hypothesis that the variances of casein group and sunflower group are equal.
Question to the test addresses:
Are the variances of two correlated sample equal?
To use this test:
paired data
testing the correlation between sum and variances
highly sensitive to the data being normally distributed
We will use sleep dataset again, as it contains paired
data (each of the 10 subjects was tested on two different drugs). We can
test if the variance in “extra sleep” was different for Drug 1 compared
to Drug 2.
The function for this is pitman.morgan.test.default()
from the PairedData package.
if (!require("PairedData")) {
install.packages("PairedData")
library(PairedData)
}
pitman.morgan.test.default(drug1_sleep, drug2_sleep)
Paired Pitman-Morgan test
data: drug1_sleep and drug2_sleep
t = -0.52636, df = 8, p-value = 0.6129
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3080221 2.0691727
sample estimates:
variance of x variance of y
3.200556 4.009000
According to the result, we can find that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that the variances from drug 1 and drug 2 are equal.
All these series of tests aim to answer the same question:
Do multiple independent samples come from populations with equal variances?
To use this test:
data are assumed to be continuous
data are measured at ordinary scale
We will use the full chickwts dataset, which has six
feed groups, to test if the variance in weight is the same across
all of them. We’ll start with the
Ansari-Bradley Test.
casein_v_sunflower <- chickwts %>%
filter(feed %in% c("casein", "sunflower"))
ansari.test(weight ~ feed, data = casein_v_sunflower)
Ansari-Bradley test
data: weight by feed
AB = 64.5, p-value = 0.1177
alternative hypothesis: true ratio of scales is not equal to 1
According to the result, we can find that p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that these two samples have the same variances.
To use this test:
We will use chickwts dataset to compare variances from
all feeds.
bartlett.test(weight ~ feed, data = chickwts)
Bartlett test of homogeneity of variances
data: weight by feed
Bartlett's K-squared = 3.2597, df = 5, p-value = 0.66
According to the results, we can find that the p-value is larger than the critical value. Therefore, we can not reject the null hypothesis that variances from all feed are the same.
nonparametric test
more robust
data can be assumed not to be normally distributed
We will compare the results from Fligner-Kileen test and Bartlett test to see anything different.
fligner.test(weight ~ feed, data = chickwts)
Fligner-Killeen test of homogeneity of variances
data: weight by feed
Fligner-Killeen:med chi-squared = 3.8109, df = 5, p-value = 0.577
According to the results, we get the exact same conclusion with Bartlett test.
To use this test:
The function leveneTest() is in the car
package, which we’ve used before. Let’s run it on our
chickwts data to see if our conclusion holds a third
time.
if (!require("car")) {
install.packages("car")
library(car)
}
leveneTest(weight ~ feed, data = chickwts)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 5 0.7493 0.5896
65
According to the results, we get the same result again.
Question to the test addresses:
Is single largest variance among a group of samples an outlier?
To use this test:
equal sample size across all groups
measure the largest variances as the outliers
We’ll use the InsectSprays data and the
cochran.test function from the outliers
package.
if (!require("outliers")) {
install.packages("outliers")
library(outliers)
}
summary(InsectSprays)
count spray
Min. : 0.00 A:12
1st Qu.: 3.00 B:12
Median : 7.00 C:12
Mean : 9.50 D:12
3rd Qu.:14.25 E:12
Max. :26.00 F:12
cochran.test(count ~ spray, data = InsectSprays)
Cochran test for outlying variance
data: count ~ spray
C = 0.41832, df = 12, k = 6, p-value = 0.004435
alternative hypothesis: Group F has outlying variance
sample estimates:
A B C D E F
22.272727 18.242424 3.901515 6.265152 3.000000 38.606061
According to the results, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis that there are no variances as outliers from all groups. The variance of spray F is significantly larger than the variances of the other sprays.
use medians for test
more robust and general-purpose
We will run this test on InsectSprays data to see what
it says about the overall equality of variances. The function is
levene.test() from the lawstat package, and we
specify location = "median" to make it a Brown-Forsythe
test.
levene.test(InsectSprays$count, InsectSprays$spray, location = "median")
Modified robust Brown-Forsythe Levene-type test based on the absolute
deviations from the median
data: InsectSprays$count
Test Statistic = 3.8214, p-value = 0.004223
According to the result, we can find that the p-value is smaller than the critical value. We can reject the null hypothesis that the variances across all groups are the same.
Question to the test addresses:
Are the variances of the differences between all possible pairs of groups in a repeated measures analysis of variance equal?
To use this test:
variances of the differences of comparison groups need to be equal
repeated measures for ANOVA analysis
We will use ChickWeight and sleep to
continue practice this test.
sleep_wide <- reshape(sleep, idvar = "ID", timevar = "group", direction = "wide", sep = "_")
sleep_matrix <- as.matrix(sleep_wide[,c('extra_1', 'extra_2')])
lm_sleep <- lm(sleep_matrix ~ 1)
mauchly.test(lm_sleep, X = ~1)
Mauchly's test of sphericity
Contrasts orthogonal to
~1
data: SSD matrix from lm(formula = sleep_matrix ~ 1)
W = 1, p-value = 1
chick_subset <- ChickWeight[ChickWeight$Time %in% c(0,2,4,6),]
chick_subset <- chick_subset[chick_subset$Chick %in% 1:10,]
chick_wide <- reshape(chick_subset, idvar = "Chick", timevar = "Time", direction = "wide", sep = "_")
chick_matrix <- as.matrix(chick_wide[,c("weight_0", "weight_2", "weight_4", "weight_6")])
chick_matrix <- na.omit(chick_matrix)
lm_chick <- lm(chick_matrix ~ 1)
mauchly.test(lm_chick, X= ~1)
Mauchly's test of sphericity
Contrasts orthogonal to
~1
data: SSD matrix from lm(formula = chick_matrix ~ 1)
W = 0.53701, p-value = 0.4439
According to the results, we can find that the p-value in practice 1 is 1 and the p-value in practice 2 is 0.4439. Both of them are larger than the critical value 0.05. For the practice 1, there are only two conditions in the test so there is no variances to compare. For the practice 2, we can not reject the null hypothesis that the variances of comparison groups are equal.
Question to the test addresses:
Do the proportion on of individuals falling in each category differ from chance? Or does the proportion on of individuals falling into each category differ from some pre-specified probabilities of falling into those categories?
A simple way to understand this test is that if we flip a coin 30 times and get 25 heads, this test can tell us the exact probability of that happening with a fair coin.
binom.test(x = 25, n = 30, p = 0.5)
Exact binomial test
data: 25 and 30
number of successes = 25, number of trials = 30, p-value = 0.0003249
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.6527883 0.9435783
sample estimates:
probability of success
0.8333333
According to the result, we can find that the p-value is smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that the true probability is 0.5 for this example. The true probabilities are from 0.645 to 0.94.
Question to the test addresses:
Is the observed proportion from a random experiment equal to some pre-specified probability?
This test is similar with Binomial test. It is suitable for large sample size. We have 100 people in a survey and 52 of them believe they prefer option A. We can test whether this is significantly from a 50% of market share.
prop.test(x = 52, n = 100, p =0.5)
1-sample proportions test with continuity correction
data: 52 out of 100, null probability 0.5
X-squared = 0.09, df = 1, p-value = 0.7642
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.4183183 0.6201278
sample estimates:
p
0.52
According to this result, we can find that the p-value is larger than the critical value 0.05. Therefore, we can not reject the null hypothesis that our survey result shows the same proportion of market share.
Question to the test addresses:
Is the rate parameter of a Poisson distributed sample significantly different from a hypothesized value?
To use this test:
Researchers observed 6 cases of a particular cancer in a group of relatives, but based on the general population, they only expected 6.2 cases. Is the observed number significantly different from the expected number?
poisson.test(x = 6, T = 6.2)
Exact Poisson test
data: 6 time base: 6.2
number of events = 6, time base = 6.2, p-value = 1
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
0.3551442 2.1063668
sample estimates:
event rate
0.9677419
According to the result, we can find that the p-value is larger than the critical value. Therefore, we can not reject the null hypothesis that the observed number is significant different from the expected number.
Question to the test addresses:
Is the difference between the pairwise proportions in three or more samples significant?
To use this test:
Suppose we have data from six different hospital wards on the proportion of patients who recovered after a treatment.
| Ward | Recovered | Not Recovered |
| s1 | 95 | 106 |
| s2 | 181 | 137 |
| s3 | 76 | 85 |
| s4 | 13 | 29 |
| s5 | 11 | 26 |
| s6 | 201 | 179 |
recovery_data <- matrix(c(95, 106, 181, 137, 76, 85, 13, 29, 11, 26, 201, 179),
ncol = 2, byrow = TRUE)
colnames(recovery_data) <- c("Recovered", "Not Recovered")
rownames(recovery_data) <- c("s1", "s2", "s3", "s4", "s5", "s6")
pairwise.prop.test(recovery_data)
Pairwise comparisons using Pairwise comparison of proportions
data: recovery_data
s1 s2 s3 s4 s5
s2 0.437 - - - -
s3 1.000 0.553 - - -
s4 0.658 0.039 0.658 - -
s5 0.658 0.042 0.658 1.000 -
s6 1.000 1.000 1.000 0.146 0.146
P value adjustment method: holm
According to the result, we can find that the p-values between s2 and s4, s2 and s5 are smaller than the critical value. Therefore, we can only reject the null hypothesis between these two groups that s2 and s4, s2 and s5 are significantly different.
Question to the test addresses:
Is the rate parameter of a Poisson distributed sample significantly different from another?
This is the two-sample version of the Poisson test.
To use this test:
two independent samples
compare the rate of events
We will test two samples:
Sample 1: Observed 10 events over a “time base” or exposure of 20,000 units.
Sample 2: Observed 2 events over a “time base” or exposure of 17,877 units.
poisson.test(c(10,2),c(20000,17877))
Comparison of Poisson rates
data: c(10, 2) time base: c(20000, 17877)
count1 = 10, expected count1 = 6.3363, p-value = 0.04213
alternative hypothesis: true rate ratio is not equal to 1
95 percent confidence interval:
0.9524221 41.9509150
sample estimates:
rate ratio
4.46925
According to the result, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis that the rate of parameters of these two samples are significantly different.
Question to the test addresses:
Is the difference between the observed proportions from two or more samples significantly different from zero?
To use this test:
two or more independent groups
compare the proportions of successes
We will use two groups to practice this test:
Group 1 (Male Faculty): 18 out of 30 were trained at top-tier schools (18 successes, 30 trials).
Group 2 (Female Faculty): 17 out of 24 were trained at top-tier schools (17 successes, 24 trials).
prop.test(c(18,17),c(30,24))
2-sample test for equality of proportions with continuity correction
data: c(18, 17) out of c(30, 24)
X-squared = 0.29335, df = 1, p-value = 0.5881
alternative hypothesis: two.sided
95 percent confidence interval:
-0.3984195 0.1817529
sample estimates:
prop 1 prop 2
0.6000000 0.7083333
According to the result, we can find that the p-value is larger than the critical value. Therefore, we can not reject the null hypothesis that there is differences between the rate of successes in the two groups.
Question to the test addresses:
Is there a linear trend in the proportions across multiple ordered categories?
To use this test:
three or more ordered groups
find a linear trend in the proportions of success
We have three groups with increasing dosage levels.
Group 1 (10mg): 10 successes out of 50 trials.
Group 2 (20mg): 17 successes out of 50 trials.
Group 3 (30mg): 25 successes out of 50 trials.
success <- c(10, 17, 25)
totals <- c(50, 50, 50)
prop.trend.test(success, totals)
Chi-squared Test for Trend in Proportions
data: success out of totals ,
using scores: 1 2 3
X-squared = 9.9343, df = 1, p-value = 0.001622
According to the result, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis that there is no linear trend.
Question to the test addresses:
Are the paired observations on two variables in a contingency table independent of each other?
To use this test:
We have a contingency table showing the voting patterns of 100 citizens, categorized by gender and political party.
| Gender | Labour | Conservative |
| Male | 20 | 30 |
| Female | 30 | 20 |
We want to test if a person’s gender is independent of their choice of political party.
voting_data <- as.table(rbind(c(20, 30), c(30, 20)))
dimnames(voting_data) <- list(gender = c("Male", "Female"), party = c("Labour", "Conservative"))
chisq.test(voting_data, correct = FALSE)
Pearson's Chi-squared test
data: voting_data
X-squared = 4, df = 1, p-value = 0.0455
According to the result, we can find that p-value is smaller than the critical value. Therefore, we can reject the null hypothesis that the person’s gender is independent of their choice of party.
Question to the test addresses:
Are the paired observations on two variables in a contingency table independent of each other?
To use this test:
small sample size
two categorical variables
We have a contingency table showing the voting patterns of 100 citizens, categorized by gender and political party.
| Gender | Labour | Conservative |
| Male | 2 | 3 |
| Female | 3 | 2 |
We want to test if a person’s gender is independent of their choice of political party.
voting_data <- as.table(rbind(c(2, 3), c(3, 2)))
dimnames(voting_data) <- list(gender = c("Male", "Female"), party = c("Labour", "Conservative"))
fisher.test(voting_data)
Fisher's Exact Test for Count Data
data: voting_data
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.01931867 9.42971111
sample estimates:
odds ratio
0.4831013
According to the result, we can find that the p-value is larger than the critical value. Therefore, we can not reject the null hypothesis. There is no significantly association between a person’s gender and their choice of political party.
Question to the test addresses:
Is there a relationship between two categorical variables after adjusting for control variables?
To use this test:
More advanced chi-squared test
test relationships between two categorical variables and control for a third categorical variable
We will create a treatment example from the book. We want to see if a treatment (“Drug” vs. “Placebo”) is associated with a response (“Improved” vs. “No Change”), while controlling for sex (“Male” vs. “Female”).
treatment_data <- array(
c(12,16,7,19,
16,11,5,20),
dim = c(2,2,2),
dimnames = list(
Treatment = c("Drug","Placebo"),
Response = c("Improved","No Change"),
Sex = c("Male","Female")
)
)
ftable(treatment_data)
Sex Male Female
Treatment Response
Drug Improved 12 16
No Change 7 5
Placebo Improved 16 11
No Change 19 20
mantelhaen.test(treatment_data)
Mantel-Haenszel chi-squared test with continuity correction
data: treatment_data
Mantel-Haenszel X-squared = 7.1983, df = 1, p-value = 0.007297
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
1.445613 7.593375
sample estimates:
common odds ratio
3.313168
According to the result, we can find that the p-value is smaller than the critical value 0.05. Therefore, we can reject the null hypothesis that there is no relationship between the treatment and response even after controlling for sex.
Question to the test addresses:
Is there a difference between paired proportions?
To use this test:
paired or matched categorical data
same subject twice and the outcome is binary (yes/no)
suitable for “before and after” test
From the book, we create a table for the diagnostic tests. We are comparing two diagnostic tests for tuberculosis (sputum culture & chest radiography) on the same group of patients.
diagnostic_data <- matrix(
c(59,4,128,20),
nrow = 2,
dimnames = list(
"Radiography" = c("Positive","Negative"),
"Sputum" = c("Positive", "Negative"))
)
ftable(diagnostic_data)
Sputum Positive Negative
Radiography
Positive 59 128
Negative 4 20
mcnemar.test(diagnostic_data)
McNemar's Chi-squared test with continuity correction
data: diagnostic_data
McNemar's chi-squared = 114.61, df = 1, p-value < 2.2e-16
According to the result, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis that two tests have the same proportion of positive results.
Question to the test addresses:
Do three or more samples come from populations with the same mean?
To use this test:
extension of the independent two-sample t-test Test 15 Two-sample t-test for the Difference in Sample Means
three or more groups of samples
assume data to be normally distributed
assume variances across different groups to be equal
We will comeback to the chickwts dataset (Let’s say
“thank you chicks!”) and compare the effects of six different feeds.
oneway.test(weight ~ feed, data= chickwts, var.equal = TRUE)
One-way analysis of means
data: weight and feed
F = 15.365, num df = 5, denom df = 65, p-value = 5.936e-10
According to the result, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis that there is no significant difference in the mean weight of chicks across six different groups.
The ANOVA test tells us that at least one group is different from others, but it doesn’t tell us which specific groups are different. To find that out, we would need to perform a follow-up test, like the Test 17 Pairwise t-test with Common Variance.
Question to the test addresses:
Do your three or more samples come from populations with the same mean?
To use this test:
do not assume equal variances across all groups
more robust alternative to the standard ANOVA test
Therefore, we only need to set the
var.euqal = FALSE.
oneway.test(weight ~ feed, data= chickwts, var.equal = FALSE)
One-way analysis of means (not assuming equal variances)
data: weight and feed
F = 19.662, num df = 5.000, denom df = 29.952, p-value = 1.177e-08
According to the result, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis. Comparing with the result from test 53, we can find that the Welch’s ANOVA test has higher F value and larger p value than the standard ANOVA test as it is more robust with fewer constraints.
Question to the test addresses:
Do your three or more samples come from populations with the same mean?
To use this test:
nonparametric equivalent of the one-way ANOVA
central tendency of three or more independent groups
do not assume the normality assumption of ANOVA
the extension of the Test 24 Wilcoxon Rank Sum Test for the Difference in Medians
kruskal.test(weight ~ feed, data = chickwts)
Kruskal-Wallis rank sum test
data: weight by feed
Kruskal-Wallis chi-squared = 37.343, df = 5, p-value = 5.113e-07
According to the result, we can find that p-value is smaller than the critical value. Therefore, we can reject the null hypothesis.
Kruskal-Wallis Rank Sum test has even fewer constraints. Therefore, the p-value is larger than the Welch’s ANOA.
Question to the test addresses:
Are the distributions from various groups the same across repeated measures?
To use this test:
repeated measures of ANOVA
three or more related/paired groups of samples
assume data not to be normally distributed
extension of Test 20 Matched Pairs Wilcoxon Test
Friedman Test in R requires matrix data. Therefore, we will create the sample data based on the book.
diet_data <- matrix(c(
8, 8, 7, 7, 6, 6, 6, 8, 6, 8, 9, 7,
5, 8, 5, 9, 7, 7, 7, 7, 7, 8, 7, 7,
8, 6, 8, 7, 6, 6, 7, 8, 6, 9, 9, 6
), nrow = 12, byrow = TRUE,
dimnames = list(
Subject = 1:12,
Diet = c("Healthy Balanced","Low Fat","Low Carb")
))
ftable(diet_data)
Diet Healthy Balanced Low Fat Low Carb
Subject
1 8 8 7
2 7 6 6
3 6 8 6
4 8 9 7
5 5 8 5
6 9 7 7
7 7 7 7
8 8 7 7
9 8 6 8
10 7 6 6
11 7 8 6
12 9 9 6
friedman.test(diet_data)
Friedman rank sum test
data: diet_data
Friedman chi-squared = 7.6, df = 2, p-value = 0.02237
According to the result, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis that there is no difference in the perceived energy levels across the three diets.
Question to the test addresses:
Are the distributions from various groups the same across repeated measures?
To use this test:
related (paired) samples
more powerful than the Friedman test
small number of groups
quade.test(diet_data)
Quade test
data: diet_data
Quade F = 3.9057, num df = 2, denom df = 22, p-value = 0.03535
According to the result, we can find that the p-value is smaller than the critical value. Therefore, we can reject the null hypothesis that there is no difference in the perceived energy levels across the three diets.
Question to the test addresses:
Is the data skewed (symmetric or long tail on one side)?
To use this test:
for normality assumption
looking for Skewness (symmetric, long tail)
if (!require("moments")) {
install.packages("moments")
library(moments)
}
set.seed(123)
normal_data <- rnorm(100)
plot(normal_data)
agostino.test(normal_data)
D'Agostino skewness test
data: normal_data
skew = 0.060499, z = 0.262698, p-value = 0.7928
alternative hypothesis: data have a skewness
According to the result, we can find that the p-value is larger than the critical value. Therefore, we can not reject the null hypothesis that the data is not skewed.
Question to the test addresses:
Does the sample exhibit more (or less) kurtosis relative to the normal distribution?
Kurtosis: Does the data have “heavy” or “light” tails compared to a normal distribution?
A distribution with high kurtosis has a sharper peak and heavier tails (meaning more outliers are likely), while a distribution with low kurtosis is flatter with lighter tails.
if (!require("moments")) {
install.packages("moments")
library(moments)
}
set.seed(123)
normal_data <- rnorm(100)
plot(normal_data)
anscombe.test(normal_data)
Anscombe-Glynn kurtosis test
data: normal_data
kurt = 2.838947, z = -0.059837, p-value = 0.9523
alternative hypothesis: kurtosis is not equal to 3
According to the result, the p-value is larger than the critical value. Therefore, we can not reject the null hypothesis that the data has the same kurtosis as a normal distribution.
Question to the test addresses:
Does the sample exhibit more (or less) kurtosis relative to the normal distribution?
To use this test:
use Geary’s measure to measure kurtosis
run multiple tests for the same property to improve confidence
Now we still use our previous normal_data here.
bonett.test(normal_data)
Bonett-Seier test for Geary kurtosis
data: normal_data
tau = 0.72928, z = -0.23812, p-value = 0.8118
alternative hypothesis: kurtosis is not equal to sqrt(2/pi)
According to the result, the p-value is larger than the critical value. Therefore, we can not reject the null hypothesis that the data has the same kurtosis as a normal distribution.
Now let’s visualise these three tests for the normal distributions.
df_normal_data <- data.frame(normal_data = normal_data)
ggplot(data = df_normal_data, aes(x = normal_data)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "skyblue", color = "black") +
geom_density(color = "blue", linewidth = 1) +
stat_function(fun = dnorm, args = list(mean = mean(normal_data), sd = sd(normal_data)), color = "red", linewidth = 1, linetype = "dashed") +
labs(title = "Histogram with Normal Curve Overlay",
x = "Value", y = "Density") +
theme_minimal()
ggplot(df_normal_data, aes(sample = normal_data)) +
stat_qq() +
stat_qq_line(color = "red", linetype = "dashed") +
labs(title = "Q-Q Plot for Normality",
x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme_minimal()